Discovering phrases in machine translation by simulated annealing

نویسندگان

  • Caroline Lavecchia
  • David Langlois
  • Kamel Smaïli
چکیده

In this paper, we propose a new phrase-based translation model based on inter-lingual triggers. The originality of our method is double. First we identify common source. Then we use inter-lingual triggers in order to retrieve their translat ions. Furthermore, we consider the way of extracting phrase translations as an optimization issue. For that we use simulated annealing algorithm to find out the best phrase translations among all those determined by inter-lingual triggers. The best phrases are those which improve the translation quality in terms of Bleu score. Tests are achieved on the proceedings of the European Parliament corpora. The training is made on a corpus containing 596K parallel sentences (French-English) and tests on a corpus of 1444 sentences. With only 8.1% of the identified source phrases occurring in the test corpus, our system overcomes the baseline model by almost 3 points.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase-Based Machine Translation based on Simulated Annealing

In this paper, we propose a new phrase-based translation model based on inter-lingual triggers. The originality of our method is double. First we identify common source phrases. Then we use inter-lingual triggers in order to retrieve their translations. Furthermore, we consider the way of extracting phrase translations as an optimization issue. For that we use simulated annealing algorithm to f...

متن کامل

Machine Translation Using Overlapping Alignments and SampleRank

We present a conditional-random-field approach to discriminatively-trained phrasebased machine translation in which training and decoding are both cast in a sampling framework and are implemented uniformly in a new probabilistic programming language for factor graphs. In traditional phrase-based translation, decoding infers both a "Viterbi" alignment and the target sentence. In contrast, in our...

متن کامل

A simulated annealing algorithm to determine a group layout and production plan in a dynamic cellular manufacturing system

In this paper, a mixed-integer linearized programming (MINLP) model is presented to design a group layout (GL) of a cellular manufacturing system (CMS) in a dynamic environment with considering production planning (PP) decisions. This model incorporates with an extensive coverage of important manufacturing features used in the design of CMSs. There are also some features that make the presented...

متن کامل

Hybrid artificial immune system and simulated annealing algorithms for solving hybrid JIT flow shop with parallel batches and machine eligibility

This research deals with a hybrid flow shop scheduling problem with parallel batching, machine eligibility, unrelated parallel machine, and different release dates to minimize the sum of the total weighted earliness and tardiness (ET) penalties. In parallel batching situation, it is supposed that number of machine in some stages are able to perform a certain number of jobs simultaneously. First...

متن کامل

Multimodal Comparable Corpora as Resources for Extracting Parallel Data: Parallel Phrases Extraction

Discovering parallel data in comparable corpora is a promising approach for overcoming the lack of parallel texts in statistical machine translation and other NLP applications. In this paper we propose an alternative to comparable corpora of texts as resources for extracting parallel data: a multimodal comparable corpus of audio and texts. We present a novel method to detect parallel phrases fr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008